65 research outputs found

    Discourse-aware neural machine translation

    Get PDF
    Machine translation (MT) models usually translate a text by considering isolated sentences based on a strict assumption that the sentences in a text are independent of one another. However, it is a truism that texts have properties of connectedness that go beyond those of their individual sentences. Disregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. Previously, some discourse-aware approaches have been investigated for conventional statistical machine translation (SMT). However, this is a serious obstacle for the state-of-the-art neural machine translation (NMT), which recently has surpassed the performance of SMT. In this thesis, we try to incorporate useful discourse information for enhancing NMT models. More specifically, we conduct research on two main parts: 1) exploiting novel document-level NMT architecture; and 2) dealing with a specific discourse phenomenon for translation models. Firstly, we investigate the influence of historical contextual information on the perfor- mance of NMT models. A cross-sentence context-aware NMT model is proposed to consider the influence of previous sentences in the same document. Specifically, this history is summarized using an additional hierarchical encoder. The historical representations are then integrated into the standard NMT model in different strategies. Experimental results on a Chinese–English document-level translation task show that the approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points. In addition, analysis and comparison also give insightful discussions and conclusions for this research direction. Secondly, we explore the impact of discourse phenomena on the performance of MT. In this thesis, we focus on the phenomenon of pronoun-dropping (pro-drop), where, in pro-drop languages, pronouns can be omitted when it is possible to infer the referent from the context. As the data for training a dropped pronoun (DP) generator is scarce, we propose to automatically annotate DPs using alignment information from a large parallel corpus. We then introduce a hybrid approach: building a neural-based DP generator and integrating it into the SMT model. Experimental results on both Chinese–English and Japanese–English translation tasks demonstrate that our approach achieves a significant improvement of up to +1.58 BLEU points with 66% F-score for DP generation accuracy. Motivated by this promising result, we further exploit the DP translation approach for advanced NMT models. A novel reconstruction-based model is proposed to reconstruct the DP-annotated source sentence from the hidden states of either encoder or decoder, or both components. Experimental results on the same translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is trained on DP-annotated parallel data. To avoid the errors propagated from an external DP prediction model, we finally investigate an end-to-end DP translation model. Specifically, we improve the reconstruction-based model from three perspectives. We first employ a shared reconstructor to better exploit encoder and decoder representations. Secondly, we propose to jointly learn to translate and predict DPs. In order to capture discourse information for DP prediction, we finally combine the hierarchical encoder with the DP translation model. Experimental results on the same translation tasks show that our approach significantly improves both translation performance and DP prediction accuracy

    New Trends in Machine Translation using Large Language Models: Case Examples with ChatGPT

    Full text link
    Machine Translation (MT) has made significant progress in recent years using deep learning, especially after the emergence of large language models (LLMs) such as GPT-3 and ChatGPT. This brings new challenges and opportunities for MT using LLMs. In this paper, we brainstorm some interesting directions for MT using LLMs, including stylized MT, interactive MT, and Translation Memory-based MT, as well as a new evaluation paradigm using LLMs. We also discuss the privacy concerns in MT using LLMs and a basic privacy-preserving method to mitigate such risks. To illustrate the potential of our proposed directions, we present several examples for the new directions mentioned above, demonstrating the feasibility of the proposed directions and highlight the opportunities and challenges for future research in MT using LLMs

    ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

    Full text link
    Recently, a new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT), which removes the penalty of word order errors in the standard cross-entropy loss. Starting from the intuition that reordering generally occurs between phrases, we extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Extensive experiments on NAT benchmarks across language pairs and data scales demonstrate the effectiveness and universality of our approach. %Further analyses show that the proposed ngram-oaxe alleviates the multimodality problem with a better modeling of phrase translation. Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.Comment: COLING 2022 Oral. arXiv admin note: text overlap with arXiv:2106.0509

    Revisiting Non-Autoregressive Translation at Scale

    Full text link
    In real-world systems, scaling has been critical for improving the translation quality in autoregressive translation (AT), which however has not been well studied for non-autoregressive translation (NAT). In this work, we bridge the gap by systematically studying the impact of scaling on NAT behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT models show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance. To reduce the side-effect of scaling on decoding speed, we empirically investigate the impact of NAT encoder and decoder on the translation performance. Experimental results on the large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger encoder and smaller decoder) can achieve comparable performance with the scaling model, while maintaining the superiority of decoding speed with standard NAT models. To this end, we establish a new benchmark by validating scaled NAT models on the scaled dataset, which can be regarded as a strong baseline for future works. We release code and system outputs at https://github.com/DeepLearnXMU/Scaling4NAT.Comment: 13 pages, Findings of ACL 202
    corecore